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Abstract 

We propose a Bayesian methodology for one-mode projecting a bipartite network that is being observed across a 
series of discrete time steps. The resulting one mode network captures the uncertainty over the presence/absence of 
each link and provides a probability distribution over its possible weight values. Additionally, the incorporation of prior 
knowledge over previous states makes the resulting network less sensitive to noise and missing observations that usually 
take place during the data collection process. The methodology consists of computationally inexpensive update rules and 
is scalable to large problems, via an appropriate distributed implementation. 

1 Introduction 

A bipartite or one-mode network is a graph Q = {U^V^S} with two sets of nodes, U and V, where connections £ exist 
only between nodes that belong to different sets. The overall connectivity is described by the N x K incidence matrix 
B, where N = \U\ and K = |V| and hik = 1 if there exists a linl^ between a given pair of nodes i^k for which 
i e U^k G V and zero otherwise. We use bipartite networks to describe a diverse range of complex systems; scientific 
collaboration networks pQ| , animal visitation patterns to various sites |[T5][20j, gene-to-disease associations |3| Social 
Media |4|, product co-purchasing networks |6|, and many more pT][22| . 

One-mode projection is the operation where a bipartite network Q = {U^V^S} described by the N x K incidence 
matrix B is mapped to a graph with only one class of nodes, Qu = f f/} via B : N x K ^ W : N x N. The 
new connections are now placed between nodes of the set U, which we shall call from now on the "source" set, based 
on the way they linked to nodes of the vanished "target" set. The most trivial way to build the adjacency matrix W of 
Qu would be to set Wij = 1 if nodes z, j G U have at least one common target k in Q and zero otherwise Q^l- A 
reasonable refinement 1 13 1 involves setting the weight of each link as the total number common targets, or co-occurrences 
Wij = Ylk=i ^ik^jk that i and j have across nodes in V. In matrix terms, this is achieved by W = BB^. Further 
extensions have been considered, such as moderating the weight by taking into account the exclusivity of co-occurrences 
pT| or introducing a saturation function |7 |, which moderate the projected link weight Wij. 

It is worth noting that the typical one-mode projection W = BB^ forces all nodes from U that point to a particular 
target node k e V to form a fully connected subgraph. Thus each target node k corresponds to a d/c -clique in Qu, where 
dk = XliLi ^ik the degree of k. Due to the heavy-tail degree distribution on the target set |9|, there is a non-trivial number 
of nodes in V with such a high degree dk that make the projected network almost fully connected |5 |. Methods that can 
be employed to regulate such densification in the resulting graph Qu, range from information filtering 1 17, 19| to defining 
appropriate null models that examine the statistical significance of the observed weights | [T5| or network motifs | |22| . 

In the present work, we seek to one-mode project a temporal bipartite network Q^^"^ = {U^ V, f }, t G {1, T} that 
is described by a sequence of incidence matrices {B*^^^}^^. The one-mode projection at any given time point t captures 
the associations between nodes j G U,by taking into account past and present link information from all steps 1 to t. 
We require that all projected connections between nodes j are appropriately weighted so that we take into account both 
the strength and the statistical significance of the association. Finally, we seek to model the uncertainty over the resulting 
topology, by placing probability distributions over the presence of each link. The model is formally presented in Section 
[2] and its application presented in Section [3] In Section]?] we conclude with a short discussion and our roadmap for future 
applications and theoretical extensions. 

* ioannis. psorakis @ eng.ox.ac.uk 

^although hik may take any value in R, denoting weight or participation strength, in this work we will consider only boolean incidence matrices. 
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2 Bayesian one-mode projection 



2.1 Problem statement 

Consider a setting where we observe a temporal bipartite network as a sequence of "snapshots" {B*^^^}^!, where each 
incidence matrix B*^^^ : N x K describes the linkage of agents to K target^ Such dataset may describe the buying 
habits of N customers where each month t are performing purchases among a set of K products, or the daily mobility 
patterns of N Social Media users who perform check-ins at K locations. Our key assumption is that there is an underlying 
association or similarity network 11 G R^^^ between agents, which directly affects the structure of {B*^^^}^^ in the 
sense that very "similar" agents consistently point to the same targets and vice-versa. Our goal is to learn the structure of 
n G R^^^ at every step t, by defining a Bayesian model that captures our belief over presence (or absence) of each link 
along with a probability distribution over its connection strength. 



2.2 Probabilistic model for graph links 



Given the observation sequence {B*^^^}^^ described in Section 2.1 let us isolate one particular timestamp t, so that 
B = B*^^^. Each element bik is 1 if agent i links to target k and zero otherwise while the sum di = Y^^^i bik is the total 
targets or out degree of i. Let us now define an additional variable Xij that we will call opportunities, which is the number 
of target nodes either i or j link to; that is obtained by performing an element-by-element logical disjunction on the rows 
of B and summing the elements of the resulting vector: 



K 



^OR{b,k,bjk) (1) 



k=i 



A list of variables used in this paper is presented in Table [T] 



Table 1: Notation 



Variable 


Interpretation 


N 


# of source nodes. 


K 


# of target nodes. 


B 


N X K incidence matrix of bipartite graph. 


W 


N X N projection matrix. 


Wij 


# of co-occurrences of agents i and j. 


di 


# degree of agent i based on B. 


Xij 


# of targets linked to by either 


i or j (opportunities) 


TTij 


G [0,1] attraction coefficient of j. 




Beta distribution parameters. 



Given the observed N x K incidence matrix B, we begin byj^erforming the standard weighted one-mode projection, 
getting the co-occurrence matrix W = BB^. Each Wij = X]/c=i ^ik^jk represents integer- valued counts that we can 
model as a draw from a binomial distribution: 



Wij ^ Bmom{7Tij;Xij), (2) 

with two parameters; the number of opportunities Xij and a bias term iTij G [0, 1] that corresponds to our modelling 
assumption that there is a latent attraction coefficient between all pairs j, which controls the extent to which oppor- 
tunities Xij are manifested as co-occurrences Wij across targets. We view nij as a measure of similarity or association 
between i and j and it is the key variable in our model; the one-mode projection we propose is a matrix 11 G R^^^ that 
contains all such iTij . 

Based on Eq. ([2]), the probability of observing a particular number of co-occurrences, or link weight, Wij is given by: 

Piw,,\n.j,x,,) = f-';^)<*'(l - 7r,,)-^-»-, (3) 

which is the likelihood function of the observed weights Wij. As our inference task is to describe the attraction 
coefficient iTij given the known Wij^Xij, a first approach would consist of maximising Eq. ^ w.r.t. Tr^j. The trivial 
maximum likelihood (ML) solution to Eq. ^ yields Tr^j = Wij/xij, which is deemed inconvenient for the following 
reasons: 

• it makes our model sensitive to degenerate values of Wij and Xij that result from noisy observations of B. 

^For the sake of simplicity, from now on we will assume that and K are fixed for each t, although such constraint can be relaxed. 
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• it provides a point estimate of iTij , thus not capturing the uncertainty on the attraction coefficient due to noise and 
missing observations. 

• it does not provide us a systematic framework for learning iTij , by exploiting both past and future observations of 
the bipartite network. 

To overcome the above difficulties, we need to employ a Bayesian approach by working with the probability distribu- 
tion over TTij', we start with an initial P{'Kij) and revise at each step t as we observe new values for Wij and Xij. 

Recall that in Eq. ^ and ^ we have stated that the co-occurrences i and j depend on the opportunities and the 
attraction coefficient. This can be expressed via a graphical model in Fig. [T] where such probabilistic dependencies are 
stated via arrows from nodes xij and tt^^ pointing to wij. This allows us to express the probability of tt^^ as: 




Figure 1 : Our graphical model, expressing how the observed (shaded circle) co-occurrences wij between individuals i 
and j depend on the number of opportunities Xij (targets where either i or j link to in the original bipartite graph) and an 
unobserved (unshaded circle) attraction coefficient tt^ j . The square plates denote deterministic parameters of the model. 



Jo P{Wij,7:ij\Xij)d7:ij 

where P{'Kij) is the prior and expresses our belief on how the attraction coefficient for z, j varies before observing Wij 
and Xij . On the other hand, the posterior P{'Kij \wij , Xij) is the revised belief on yr^j under the light of these observations. 
Because iiij G [0, 1] we can model P{'Kij) as a Beta distribution: 

Hij Beta(ai_^-,/3i_^-), (5) 

parameterised by aij and Pij , so that: 

P{7r,j) = ^ ^ . (6) 

Having an analytic structure for our prior in Eq. ([6]) , we combine it with the likelihood from Eq. ^ based on Eq. ^ 
to get the posterior: 



— Beta(ai^- + Wij.Pij + Xij - Wij) (7) 



which is a revised Beta distribution over yr^j, with updated parameters: 



ft: 



(8) 
(9) 
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The posterior distribution P{7Tij\wij^Xij) = Beta(a-^, provides all the information we need to describe the 
attraction coefficient Tr^j, capturing the uncertainty over each possible value in [0, 1], while all dependencies between 
links are encoded in the Wij and Xij terms. In order to build the projection matrix 11 G R^^^, fixed-point estimates can 
be directly derived from the posterior. For this particular study we have used the expected value IE(7rij) = , for 

each element of n G M^><^. 

Having from Eq. (|7| a fully probabilistic formulation for the attraction coefficient, we can proceed one more step 
further and "integrate out" Tr^j from the likelihood function in Eq. ^ in order to obtain the probability distribution over 
the connection weight Wij : 



P{wij \xij , a[j , ) = / P{wij , TTij \xij, a[j , P'ij)diTij 
Jo 

iJ Jo 

(10) 



Wi 



which is a Beta-binomial probability density function and B{.^ .) is the standard beta function. Such distribution 
captures the variability of co-occurrences Wij given our noise model. From the above equation we can estimate the 
expected value for the weights as E{wij) = ^^^^^^{ . . 

In this section we have described the theoretical foundation of our model along with the one-mode projection scheme 
for a single learning step t. The full process involves cycling through the update equations: 



St) 



OL, 



- w, 



{t-1) 



(t-i) 



+ 4 -'^^J 



(11) 

(12) 



and revising our distributions over the attraction coefficients and link weights. Details for the full learning scheme are 
presented in the following section. 



2.3 Algorithmic and implementation details 

Consider the state of the system at time t = 0, before receiving the first network "frame" B*^^^ . At this stage, we have no 
observations regarding the bipartite graph and any prior beliefs on the agent pair associations z, j are encoded in the Beta 



parameters a -^^ , (3^^ 
coefficients tt-?^ around 0.5. 



. These can be initialised, for example, to vanilla values a -^^ = /^^^^ = 



10 that center the attraction 



Upon receiving the first B*^^^ we calculate the opportunities Xij and then the co-occurrences 



2^k=l ^ik ^jk 



all i, j . We then update aij , Pij based on Eq. (11) and ( 12 ). We then construct the projection matrix 11 using the expected 
values E{7r ij). The whole process is presented in Algorithm 1. 



Algorithm 1 Bayesian One-Mode Projection 



Require: bipartite sequence {B*^^^}^ 



10: 

11: 
12 
13 



Initialize V j G {1, TV} 



/o) .(0) 

for t = to to T do 
SetB = B(^) 



Get opportunities xfj^ from Eq. |l| 
Get co-occurrences via W*^^^ = BB^ 
fori, j G {1,...,A^} do 



update a\j from Eq. (|8 ) 
update 



from Eq. (|9 ) 

(*) 



end for 
end for 

return n(^) = [E(^)(^,,0]MGiv, K^'^ E^^H {1,...,^} 
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The computational cost of Algorithm 1 can be moderated via an appropriate distributed implementation. Non-holistic 
matrix operations such as the multiplication BB^ can be parallelised (examples for Map-Reduce are shown in 1 18 1) while 
aij^Pij updates for each pair z, j can be performed at different processing units. The benign computational scalability 
of the method lies on the structure of the probabilistic model itself; the conjugacy of our Beta prior in Eq. ^ with our 
Binomial likelihood function in Eq. ^ makes the posterior in Eq. ^ an updated Beta, thus no sampling (such as Markov 
Chain Monte Carlo) schemes need to be employed. In the next section we will describe application of the above in a 
working example, using an artificially generated dataset. 

3 An illustrative example 



Posterior density over linl< presence between nodes 1 & 2, across time steps 



Expected value of attraction coefficient Eln^^] across time steps 





70 80 90 100 



(a) 



(b) 



Observed versus expected linl< weight between nodes 1&2 




E[w12] 

o obs. w12 
x12 



(c) 



Figure 2: We plot various association metrics for the node pair 1-2, across 100 time steps. In Fig. 2(a) we illustrate how 



the distribution over the attraction coefficient 7r^2^ changes. Note that as we progress through time, not only the expected 



value increases, as shown in Fig. 2(b) but also probability mass is more concentrated around the mode, thus making more 



confident estimations due to accumulated data. In Fig. |2(c)| we show how by taking into account the distribution on 7ri2, 
we obtain smoother estimates (blue line) on the number of co-occurrences in contrast to the observed ones (red dots). 



3.1 Artificial data generation scheme 

Consider the following 5x4 incidence matrix, which encodes the topology of a toy bipartite network: 



B = 



110 
110 

1111 

11 
11 



using the above matrix as a template, we generate a noise-contaminated sequence of bipartite networks {B*^^^}^^ 
based on a seed matrix: 



B 



seed 



0.80 


0.90 








0.90 


0.70 








0.90 


0.80 


0.90 


0.90 








0.80 


0.90 








0.60 


0.90 
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For each day t = {1, 100}, we generate B*^^^ by iterating through all pairs k and setting bfj^ = 1 with probability 

of success Such temporal bipartite stream may represent N = 5 products that are allocated every day to i^T = 4 

shopping baskets, during a period of T = 100 days. At the end of each day t, we receive a new basket log B*^^^. Product 
pairs 1-2 and 3-4 tend to be co-purchased quite frequently (possibly because they complement each other for a particular 
task), while product 3 tends to appear across all baskets due to its general popularity (e.g. milk). For each day t, our goal 
is to learn the associations between each pair j, by exploiting current and past observations. 




100 



(a) (b) 



Observed versus expected link weight between nodes 2&4 




10 20 30 40 50 60 70 80 90 100 



(C) 



Figure 3: We plot various association metrics for the case of nodes 2-4, which do not have common targets in our sequence 
of bipartite graphs. Similar observations as in the case of 1-2, we get more confident predictions for 7r24 in Fig. 3(a) and 
smooth estimates for their number of co-appearances in Fig. |3(c)| The difference in this example is that our posterior 



belief on the attraction coefficient is dropping, as we do not observe any co-occurrences (red dot in Fig. 3(c) ) Wi 



3.2 Applying the method 

We start at t = by assuming no prior knowledge on any link structure. At this point, we have not seen any observations 
thus our model parameters must reflect our ignorance on how the associations may be. Therefore for each pair j we set 
a-^^ = pIj-' = 10 that gives IE[7r-^^] = 0.5 Vz, j, denoting that we are unable to tell if there should be an association link 
between z, j before seeing any data. 

At t = 1 we have our first chunk of data, namely the incidence matrix B*^^^. The opportunity values x-j^ are retrieved 
from B*^^^ via Eq. jlj), while the co-occurrences w^j^ are retrieved via a standard weighted one-mode projection of B'^^Ho 

W^^\ These values of wl^-' , x • , along with a -^^ , /^{^^^ from initialisation, are all that we need to describe the distribution 
of the attraction coefficients via the update equations ^U) and (12). Once we have performed the update, we can get a 



fixed- value estimate of the attraction coefficient (or association index in our example) via its expected value under the 
posterior distribution E*^^^(7rij) = -j^y^—^. 

The posterior distribution over w^^^ (i.e. the number of baskets where both i and j co-appear) is obtained via Eq. M, 
which gives a much better description of the co-occurrences between i and j, as it not only considers both the current and 
prior observations but also handles uncertainty in a principled manner and it is less sensitive to missing data. 

In Fig. 2(a) we plot how the posterior distribution P(7ri2|i^i2, ^12, <^12, P12) of the attraction coefficient 7r^2 P^^" 



gresses during each iteration. We can see that for t = the distribution is our flat prior centered around 0.5, because we 
have no evidence to support the presence of an association between nodes 1 and 2. As we start observing non-zero link 
weights W12 this prior belief is updated in order to explain the incoming data, effectively shifting the distribution so that 



more probability mass is concentrated around larger values of 1T12. Indeed, in Fig. |2(b)|we plot the expected value of 11^2 
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Figure 4: We plot association metrics for the node pair 2-3. Although the pair has about same number of co-occurrences 



across t as 1-2 (red dots in Fig. 4(c)), the estimated attraction coefficient decreases (see Fig. 4(a) and 4(b)) due the large 



number of opportunities X23 (black line in Fig. 4(c)), that imply lack of exclusivity in the observed co-occurrences. 



for each P(7r^2 I ^12 ' ^12 5 ^12 5 /^12 ) P^^ iteration, where we can see the gradual increase of E[7r^2 ] ^^^m 0.5 to ~ 0.68. It 
is important to notice that the increase of IE[7r^2 ] 1^^^ steep in later t, as the impact of new co-occurrences j is not so 
strong as in the beginning of data collection. Such important saturation or "diminishing returns" property arises naturally 
in Bayesian learning models, without the need to explicitly induce it via additional machinery such as hyperbolic tangent 
functions fl]. 



On the other hand, in Fig. 3(a) we plot the posterior distribution P{7r24\w24^ X2a^ <^24, that expresses our belief 
on the presence of a link between nodes 2 and 4. As before, he distribution is initially centered around 0.5, due to the lack 
of evidence, while upon receiving data it rapidly shifts towards small values which can also be seen by plotting E[7r24] in 



Fig. 3(b) We also note that although the distribution is flatter during the initial iterations, we are constantly making more 



and more confident predictions (decreasing posterior entropy) as we keep observing • 



Another point worth noting, is the difference in behaviour of the observed link value w\f against the expected value 



E[K;-j^] under the posterior distribution P{wl^-^ \x 



al?,/3lf).InFig. 



2(c) 



and 



3(c) 



we compare the observed w[f (red 



dots) against IE[^^^-^ ] where we can see that the expected value of the posterior is a smoother estimate of the link weight, 
being less sensitive to large fluctuations due to noise and missing observations. 

Now let us examine how our method models associations between i = 3 and other nodes in the graph. Node 3 is an 
exceptional case in our example, as it tends to appear in every gathering target (recall the example where the milk tends 
to appear across all shopping baskets); If we look again at the incidence matrix B^^^^ we can see that for every noisy 
realisation B*^^^ of B in our artificial data stream, we have P{h^^l = 1) > 0.8, V /c, t. Therefore under a naive model, 
node 3 would likely have a high similarity with all other nodes in the network, just because of its linkage to all targets in 
the origina l bipa rtite graph. 



and IE[i^23 ] ^^S- K(c)| as we did i n the previous cases above. We can see that although we consistently observe non-zero 

decreases across t (seen 



In Fig. 4(a) we plot the posterior density curves of 7r23 across t, along with the expected values IE[7r23 ] ^^S- ^(b) 



link weights (red dots in Fig. 4(c)|), the association probability or attraction coefficient tt. 



in Fig. |4(b)| ). This is in complete disagreem ent w ith the case of association between agents 1 and 2, where consi stently 
non-zero observations (red dots in Fig. 2(c)) led to an increase in the attraction coefficient 7r^2 (seen in Fig. 2(b)). 
What is the reason behind this inconsistency? 

The reason why our posterior belief over the presence of the link 2-3 is reduced, although we observe non-zero co- 
occurrences between individuals 2 and 3, is because of the role the opportunities variable Xij plays in the model. Recall 
Eq. ^\ 
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(13) 



which says that the observed Hnk weight Wij is the result of a latent attraction term iTij that controls how many of the 
opportunities Xij are manifested as co-occurrences. These opportunities are the total number of targets where either i or 
j link to. In the example of products 1 and 2 and based on B^^^^, their placement to baskets almost completely overlaps. 



Therefore, as we can see in Fig. |2(c)| the observed number of co-occurrences Wi2 (red dots) tends to match the number 
of opportunities Xij (black dashed line), forcing the model to infer high association values 7ri2. 

On the other hand, although nodes 2 and 3 have similar values of W23 with the observed link weights W12 (red dots in 



Fig. 4(c) ), their number of opportunities X23 (dashed black line in Fig. 4(c) ) is much higher due to the participation of 
agent 3 across all gathering targets. That leads our model, based on the binomial model in Eq. ([2]), to infer a lower value 
of the attraction coefficient 7123, effectively penalising such lack of exclusivity in the co-appearances of 2 with 3. 

This is a very attractive property of the model, which not only regulates the link weights between "gregarious" node 
(that tend to link to everywhere) and "selective" ones (with a small set of targets they point to), but also allows the model to 
downplay the effect of purely coincidental co-appearances on the attraction coefficient, which would otherwise introduce 
"junk" associations in the projection network. 



3.3 Notes on changepoint detection 

We have described a probabilistic model that allows us to infer latent associations between source nodes based on their 
common linkage to target nodes in a temporal bipartite network. Our approach consists of processing the data stream in a 
T-number of "chunks" and updating the model parameters based on new observations received at t and prior knowledge 
from previous steps 0, t — 1. Such fusion of information from both current observations and past experience lies at the 
heart of every Bayesian learning model and allows us to perform rigorous inference in real- world settings where noise 



and missing observations are prevalent. Indeed, we have already shown in our artificial example of Fig. |2(c)| that although 
the observations (red dots) of the number of co-occurrences Wij between two agents greatly fluctuate, our probabilistic 
treatment allows us to extract a smooth trend of how wij progresses through time. 

We have shown that our method learns the association patterns of nodes, by making more and more confident pre- 
dictions on the model quantities of interest while being resilient to perturbations induced by noise. The question is, 
what happens in cases where the underlying system dramatically changes at some given time point tc, making all prior 
knowledge from t = 0, tc — 1 obsolete? 

In order to illustrate this, let us revisit the example of Section [3j where at a given point tc we generate data from a 
different seed matrix: 



0.90 


0.80 














0.90 


0.90 


0.80 


0.90 


0.80 


0.90 


0.90 


0.80 














0.90 


0.80 



jseed 



where in this case products 1 & 4 now co-appear in the first two baskets, 2 & 5 in the last two while 3 remains common 
across slU K = 4 baskets. 

Assume now that we r un ou r methodology on a dataset of T = 200 instances of B*^^^ , where the first 100 are generated 
I and at tc = 101 to T we use the new B^^^^ instead. In Fig. [sjwe plot the expectation of 
between pairs 1 & 2 along and 1 & 4. 
We can see that for the first 100 steps, before the changepoint, the model has the exact behaviour as presented in 



from B^^^^ (as in Section 
the attraction coefficient E 



3.1 



Section I3} as tends to be non-zero for t < the expected value of our posterior IE[7ri2 ] increases as we observe 



more data (blue line in Fig. [5]). Similarly, as = for t < tc there is a steep drop of IE[7r24 ] to zero (dashed red line 
in Fig. [5]). For t > tc, we can see that the model responds by slowly shifting the posterior mean. In fact, even though 
the number of observations after the changepoint is the same as the one before t^ the model fails to reach an appropriate 
value in both cases; there exists a lot of prior knowledge that makes the model to expect new data that relatively conform 
with the system state before tc. 

The limitation described above is not a drawback of Bayesian learning in general; in fact, the model behaves exactly 
as it should in settings where the underlying mechanism that generates the observations is stationary. By implying such 
stationarity in the system, we are effectively making our model changepoint-naive and our inferences very conservative 
as the status-quo is changing. In order to handle such cases, we have to control the way prior and current information is 
fused, by introducing a mixing coefficient hZij that maps each update equation ( 8j9 ) to a convex sum: 



(1 - Kij)Wij 
(1 — K,ij){Xij 



Wi 



(14) 
(15) 
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Expected attraction coefficient E[ii..] across time steps for 2 pair cases 







---^\.\,\ - 

changepoint j 






^^^^^^^^^^^^^^ ^^^^^^^^^ 







20 40 60 80 100 120 140 160 180 200 



Figure 5: We demonstrate the responsiveness of the model at the presence of a particular changepoint. For the first 100 
time steps, nodes 1 and 2 tend to point to the same targets in the bipartite network and model behaves exactly as in Fig. 



2(b) After a particular step tc, 1&2 stop having common targets thus the attraction coefficient starts to drop. The reason 



for such slow drop after tc is due to the fact that past observations (before tc) are strongly weighted in the model. 



At this stage, the parameter k is determined manually and depends on the application. To automate the selection of tz 
the simplest approach is to use a bank of filters each with different values for n. The choice of n can then be determined 
using the probability of the next observation ^^j^^^ under the posterior predictive distribution ijijl in Eq. 10 While 
simple to implement, the approach could quickly become computationally expensive compared to alternative approaches 
based on generalised linear models which update the log-odds of the attraction coefficients using dynamic linear model 
recursions |T4| . Both approaches avoid ad-hoc heuristics for the selection of n and allow us to detect change points and 
dynamically control the degree to which we mix past and present information in order to perform optimal predictions. 



4 Discussion and future work 

In this report we have presented a probabilistic approach for one-mode projection temporal bipartite networks, in order to 
infer latent associations between the node class of interest. Such inferred associations yr^j are parameters of a binomial 
noise model, which can be viewed as link probabilities for all pairs of nodes, effectively mapping the the temporal bipartite 
network to an ensemble of possible graphs. This is a very attractive aspect of our method, as, along with the distributions 
over TTij, fully captures the uncertainty over connectivity patterns. Additionally, the model benefits for constant influx of 
new information by updating our current beliefs over the network topology based on more recent observations. 

Capturing the uncertainty over graph links and weights is only one aspect of the method. The probability of connection 
or association between any pair of nodes an also be used for link completion tasks or personal recommendation tasks, 
while macroscopic topological properties of the inferred networks can now be described in the form of distributions over, 
for example, clustering coefficients, geodesic distances and diameters. That allows us to study the stability of the overall 
structure and the crispness of properties such as community structure, homophily, small-world effect. 

In addition to further development of theoretical methods, we plan to apply the one-mode projection to a variety of 
data domains well suited for the concept of temporal bipartite graphs, with highly dynamic and uncertain association 
structures. Namely, we have collected a dataset documenting the trading positions of many individual investors over time, 
containing both the date and the direction of their position. By treating specific investments within given time ranges as 
the "target events" on our network - e.g. buying shares in Microsoft within two weeks of their earnings announcement, 
we develop a two-mode network of investors to events. 

We have already built a bipartite network of investors and investments, developing through this a non-temporal pro- 
jection for an association network. That is to say, we trivially have the bipartite network of all stocks and traders if we do 
not consider time as a variable, and can project this onto the trader network to provide weighted associations. However, 
especially in finance, it is important to both capture the dynamics of associations as they change in time and quantify the 
uncertainty around such associations. Thus we have begun applying the method documented here to generate a probabilis- 
tic, and time dependent, network of associations among the traders. Given such network we can explore concepts such as 
homophily 1 8 1, assortativity |T2) or community structure p6) among investors and their impact on returns. In future work 
we plan to run the model over a number of time periods to investigate the ability to detect changepoints, which in this 
case correspond to times of rapidly changing opinion or investing style (such as the financial crisis or the announcement 
of macro-economic policy shifts on the part of central banks). Preliminary results demonstrate the advantage of such an 
approach, but we plan to systematically explore its benefits, and compare it thoroughly to existing methods of detecting 
structure among investors. 

Moreover, we plan to develop the method further to capture non-network related variables which vary over time, but 
impact the likelihood of association (in the case of financial markets, one can imagine general market sentiment, or stock 
correlation). The hope with this work would be to understand times at which joint appearance at target nodes represents 
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stronger associations - if two investors make the same decision at times of limited liquidity, we should be able to capture 
this information systematically. In this sense, we regard the financial dataset as an ideal testing ground for future work 
as a variety of other variables are already tracked and easily experimented on. More generally, we hope that as we apply 
this method to further domains we will also discover deeper nuances which allow us to improve the theory underlying the 
model. 
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